A practical method to detect SNVs and indels from whole genome and exome sequencing data
نویسندگان
چکیده
The recent development of massively parallel sequencing technology has allowed the creation of comprehensive catalogs of genetic variation. However, due to the relatively high sequencing error rate for short read sequence data, sophisticated analysis methods are required to obtain high-quality variant calls. Here, we developed a probabilistic multinomial method for the detection of single nucleotide variants (SNVs) as well as short insertions and deletions (indels) in whole genome sequencing (WGS) and whole exome sequencing (WES) data for single sample calling. Evaluation with DNA genotyping arrays revealed a concordance rate of 99.98% for WGS calls and 99.99% for WES calls. Sanger sequencing of the discordant calls determined the false positive and false negative rates for the WGS (0.0068% and 0.17%) and WES (0.0036% and 0.0084%) datasets. Furthermore, short indels were identified with high accuracy (WGS: 94.7%, WES: 97.3%). We believe our method can contribute to the greater understanding of human diseases.
منابع مشابه
Lightning-fast genome variant detection with GROM
Current human whole genome sequencing projects produce massive amounts of data, often creating significant computational challenges. Different approaches have been developed for each type of genome variant and method of its detection, necessitating users to run multiple algorithms to find variants. We present Genome Rearrangement OmniMapper (GROM), a novel comprehensive variant detection algori...
متن کاملWhole-genome sequencing is more powerful than whole-exome sequencing for detecting exome variants.
We compared whole-exome sequencing (WES) and whole-genome sequencing (WGS) in six unrelated individuals. In the regions targeted by WES capture (81.5% of the consensus coding genome), the mean numbers of single-nucleotide variants (SNVs) and small insertions/deletions (indels) detected per sample were 84,192 and 13,325, respectively, for WES, and 84,968 and 12,702, respectively, for WGS. For bo...
متن کاملBALSA: integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by GPU
This paper reports an integrated solution, called BALSA, for the secondary analysis of next generation sequencing data; it exploits the computational power of GPU and an intricate memory management to give a fast and accurate analysis. From raw reads to variants (including SNPs and Indels), BALSA, using just a single computing node with a commodity GPU board, takes 5.5 h to process 50-fold whol...
متن کاملPerformance comparison of four commercial human whole-exome capture platforms
Whole exome sequencing (WXS) is widely used to identify causative genetic mutations of diseases. However, not only have several commercial human exome capture platforms been developed, but substantial updates have been released in the past few years. We report a performance comparison for the latest release of four commercial platforms, Roche/NimbleGen's SeqCap EZ Human Exome Library v3.0, Illu...
متن کاملDevelopment of Genetic Markers in Eucalyptus Species by Target Enrichment and Exome Sequencing
The advent of next-generation sequencing has facilitated large-scale discovery, validation and assessment of genetic markers for high density genotyping. The present study was undertaken to identify markers in genes supposedly related to wood property traits in three Eucalyptus species. Ninety four genes involved in xylogenesis were selected for hybridization probe based nuclear genomic DNA tar...
متن کامل